Why visualize data?
Good visualizations can give:
Powerful summaries of the underlying data
Communicate insights often to audiences who do not have the same luxury of spending so much time with the data as you do.
As a Data analyst/ Scientist, it’s your responsibility to give the necessary high level summaries or takeaways in any data visual you create.
Some Features of Good Visualizations
Clear on what they’re communicating
Well defined axis, with the right scaling and labels
Good choice of colors and anotations (visually appealing)
Less is more
Some Features of Bad Visualizations
Cluttered, too much going on in the chart with no clear communication goal
Truncating axes to start at non-zero values which distorts interpretation
Poor choice of colors
Unnecessary 3D-fying
Our data for today - Netflix Movies & TV Shows
## show_id type title director
## Length:8807 Length:8807 Length:8807 Length:8807
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## cast country date_added release_year
## Length:8807 Length:8807 Length:8807 Min. :1925
## Class :character Class :character Class :character 1st Qu.:2013
## Mode :character Mode :character Mode :character Median :2017
## Mean :2014
## 3rd Qu.:2019
## Max. :2021
## rating duration listed_in description
## Length:8807 Length:8807 Length:8807 Length:8807
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
| Name | netflix |
| Number of rows | 8807 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 11 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| show_id | 0 | 1.00 | 2 | 5 | 0 | 8807 | 0 |
| type | 0 | 1.00 | 5 | 7 | 0 | 2 | 0 |
| title | 0 | 1.00 | 1 | 104 | 0 | 8807 | 0 |
| director | 2634 | 0.70 | 2 | 208 | 0 | 4528 | 0 |
| cast | 825 | 0.91 | 3 | 771 | 0 | 7692 | 0 |
| country | 831 | 0.91 | 4 | 123 | 0 | 748 | 0 |
| date_added | 10 | 1.00 | 11 | 18 | 0 | 1714 | 0 |
| rating | 4 | 1.00 | 1 | 8 | 0 | 17 | 0 |
| duration | 3 | 1.00 | 5 | 10 | 0 | 220 | 0 |
| listed_in | 0 | 1.00 | 6 | 79 | 0 | 514 | 0 |
| description | 0 | 1.00 | 61 | 248 | 0 | 8775 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| release_year | 0 | 1 | 2014.18 | 8.82 | 1925 | 2013 | 2017 | 2019 | 2021 | ▁▁▁▁▇ |
Some Bad Visualizations
Example 1
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
What’s wrong with that plot?
The visualization is bad because:
It’s vague, putting together all movie ratings does help the audience identify what you’re trying to communicate.
The rating categories are too many. Remember, good visuals give high level summaries (less is more)
The pie chart used here is not the best tool for comparing multiple categories.
Pie charts also make it difficult for your audience to judge the relative sizes of the slices.
Let’s look at another.
Example 2
Examples of Good Visualizations
Example 1
Example 2
Visit this github repo for the code.